139 research outputs found

    Vision-Based Navigation III: Pose and Motion from Omnidirectional Optical Flow and a Digital Terrain Map

    Full text link
    An algorithm for pose and motion estimation using corresponding features in omnidirectional images and a digital terrain map is proposed. In previous paper, such algorithm for regular camera was considered. Using a Digital Terrain (or Digital Elevation) Map (DTM/DEM) as a global reference enables recovering the absolute position and orientation of the camera. In order to do this, the DTM is used to formulate a constraint between corresponding features in two consecutive frames. In this paper, these constraints are extended to handle non-central projection, as is the case with many omnidirectional systems. The utilization of omnidirectional data is shown to improve the robustness and accuracy of the navigation algorithm. The feasibility of this algorithm is established through lab experimentation with two kinds of omnidirectional acquisition systems. The first one is polydioptric cameras while the second is catadioptric camera.Comment: 6 pages, 9 figure

    Localization and Positioning Using Combinations of Model Views

    Get PDF
    A method for localization and positioning in an indoor environment is presented. The method is based on representing the scene as a set of 2D views and predicting the appearances of novel views by linear combinations of the model views. The method is accurate under weak perspective projection. Analysis of this projection as well as experimental results demonstrate that in many cases it is sufficient to accurately describe the scene. When weak perspective approximation is invalid, an iterative solution to account for the perspective distortions can be employed. A simple algorithm for repositioning, the task of returning to a previously visited position defined by a single view, is derived from this method

    Semantic Parsing of Colonoscopy Videos with Multi-Label Temporal Networks

    Full text link
    Following the successful debut of polyp detection and characterization, more advanced automation tools are being developed for colonoscopy. The new automation tasks, such as quality metrics or report generation, require understanding of the procedure flow that includes activities, events, anatomical landmarks, etc. In this work we present a method for automatic semantic parsing of colonoscopy videos. The method uses a novel DL multi-label temporal segmentation model trained in supervised and unsupervised regimes. We evaluate the accuracy of the method on a test set of over 300 annotated colonoscopy videos, and use ablation to explore the relative importance of various method's components

    3D Human Body-Part Tracking and Action Classification Using a Hierarchical Body Model

    Full text link
    This paper presents a framework for hierarchical 3D articulated human body-part tracking and action classification. We introduce a Hierarchical Annealing Particle Filter (H-APF) algorithm, which applies nonlinear dimensionality reduction of the high di-mensional data space to the low dimensional latent spaces combined with the dynamic motion model and the Hierarchical Human Body Model. The improved annealing ap-proach is used for the propagation between different body models and sequential frames. The tracking algorithm generates trajectories in the latent spaces, which provide low di-mensional representations of body poses, observed during the motion. These trajectories are used to classify human motions. The tracking and classification algorithms were checked on HumanEvaI, HumanEvaII, and other datasets, involving more complicated motion types and transitions and proved to be effective and robust. The comparison to other methods and the error calculations are provided.

    Visual Tracking by Affine Kernel Fitting Using Color and Object Boundary

    Full text link
    Kernel-based trackers aggregate image features within the support of a kernel (a mask) regardless of their spatial structure. These trackers spatially fit the kernel (usually in location and in scale) such that a function of the aggregate is optimized. We propose a kernel-based visual tracker that exploits the constancy of color and the presence of color edges along the target boundary. The tracker estimates the best affinity of a spatially aligned pair of kernels, one of which is color-related and the other of which is object boundary-related. In a sense, this work extends previous kernel-based track-ers by incorporating the object boundary cue into the track-ing process and by allowing the kernels to be affinely trans-formed instead of only translated and isotropically scaled. These two extensions make for more precise target local-ization. Moreover, a more accurately localized target facil-itates safer updating of its reference color model, further enhancing the tracker’s robustness. The improved tracking is demonstrated for several challenging image sequences. 1

    Weakly-Supervised Surgical Phase Recognition

    Full text link
    A key element of computer-assisted surgery systems is phase recognition of surgical videos. Existing phase recognition algorithms require frame-wise annotation of a large number of videos, which is time and money consuming. In this work we join concepts of graph segmentation with self-supervised learning to derive a random-walk solution for per-frame phase prediction. Furthermore, we utilize within our method two forms of weak supervision: sparse timestamps or few-shot learning. The proposed algorithm enjoys low complexity and can operate in lowdata regimes. We validate our method by running experiments with the public Cholec80 dataset of laparoscopic cholecystectomy videos, demonstrating promising performance in multiple setups

    Recognition by Functional Parts

    Get PDF
    (Also cross-referenced as CAR-TR-703) We present an approach to function-based object recognition that reasons about the functionality of an object's intuitive parts. We extend the popular "recognition by parts" shape recognition framework to support "recognition by functional parts", by com bining a set of functional primitives and their relations with a set of abstract volumetric shape primitives and their relations. Previous approaches have relied on more global object features, often ignoring the problem of object segmentation and thereby restricting themselves to range images of unoccluded scenes. We show how these shape primitives and relations can be easily recovered from superquadric ellipsoids which, in turn, can be recovered from either range or intensity images of occluded scenes. Furthermore, the proposed framework supports both unexpected (bottom-up) object recognition and expected (top-down) object recognition. We demonstrate the approach on a simple domain by recognizing a restricted class of hand-tools from 2-D images

    Clinical BERTScore: An Improved Measure of Automatic Speech Recognition Performance in Clinical Settings

    Full text link
    Automatic Speech Recognition (ASR) in medical contexts has the potential to save time, cut costs, increase report accuracy, and reduce physician burnout. However, the healthcare industry has been slower to adopt this technology, in part due to the importance of avoiding medically-relevant transcription mistakes. In this work, we present the Clinical BERTScore (CBERTScore), an ASR metric that penalizes clinically-relevant mistakes more than others. We demonstrate that this metric more closely aligns with clinician preferences on medical sentences as compared to other metrics (WER, BLUE, METEOR, etc), sometimes by wide margins. We collect a benchmark of 13 clinician preferences on 149 realistic medical sentences called the Clinician Transcript Preference benchmark (CTP), demonstrate that CBERTScore more closely matches what clinicians prefer, and release the benchmark for the community to further develop clinically-aware ASR metrics

    Blind decomposition of transmission light microscopic hyperspectral cube using sparse representation

    Get PDF
    Abstract-In this paper, we address the problem of fully automated decomposition of hyperspectral images for transmission light microscopy. The hyperspectral images are decomposed into spectrally homogeneous compounds. The resulting compounds are described by their spectral characteristics and optical density. We present the multiplicative physical model of image formation in transmission light microscopy, justify reduction of a hyperspectral image decomposition problem to a blind source separation problem, and provide method for hyperspectral restoration of separated compounds. In our approach, dimensionality reduction using principal component analysis (PCA) is followed by a blind source separation (BSS) algorithm. The BSS method is based on sparsifying transformation of observed images and relative Newton optimization procedure. The presented method was verified on hyperspectral images of biological tissues. The method was compared to the existing approach based on nonnegative matrix factorization. Experiments showed that the presented method is faster and better separates the biological compounds from imaging artifacts. The results obtained in this work may be used for improving automatic microscope hardware calibration and computer-aided diagnostics
    • …
    corecore